AFO 114 – Record matching
Use this AFO to find and process matching records when merging records. Use the Record match menu options to:
· Detect duplicate records when records are loaded or merged into a database file.
· Remove duplicate records from record sets. For example, in the record sets produced by a Z39.50 search.
Record matching allows you to decide if new and duplicate records are to be accepted, rejected, or set aside in a savelist.
The record matching process uses a matching profile. You can define a number of matching profiles. However, the record matching process can use only one matching profile at a time.
The following specialised terms are used:
· Matching Profiles: a set of rules for searching for matching records in a set of records.
· Matching Files: a rule for searching for matching records in a set of records.
· Key definition profiles: a key for determining if records are identical and matching.
· (Re)create Matching files: a type of index containing unique record keys.
After starting this AFO the following submenu is displayed:
A matching profile is a set of rules for matching records. Use the following steps in order to create a matching profile:
· On the “ ”, use “ ” to create a profile for matching imported or loaded records.
· On the “ ”, use “ ” to create a matching file to be used in a matching profile.
· On the “ ”, use “ ” to create a key definition profile to be used in a matching file.
These steps can be used to get an overview of all duplicate titles in the matching file. The file can also be used to compare files to be imported, but you can of course also do this by using a normal index.
Comparing titles is necessary to be able to detect duplicates during importing or merging processes. Comparison is done via indexes or pre/defined keys. The result of a comparison can be zero, one ore multiple duplicates.
New records as well as duplicates can be accepted, rejected or set aside in a savelist with a special status to be assessed later.
Comparison of records is not only done during imports, but can also be used as a separate action by the user, to assess a file. Apart from that it is also used to compare two different sets of records, for instance after a Z39.50 search. In this case so called matching files are used.
A key consists of elements from a record. This is a string of characters which can look like “financial^jone^elsev^2004” (random example). The elements of the key are derived from the record, in the example: the first word of the title, the first four characters of the author name, the first five characters of the publisher name and the year of publication. Because elements of a key can be derived from repeatable fields )like author name, multiple keys per title are possible. For comparison all available keys are used.
After choosing this option the following overview screen is displayed:
: choose this option to create a new key definition profile.
: select a profile and then this option to modify the general properties.
:select a profile and then this option to delete it. The system will ask for confirmation.
:select a profile and then this option to modify the linked definition. In that case the following screen will be displayed:
: choose this option to create a new definition.
: select an item and then this option to modify the general properties.
:select an item and then this option to delete it.
In the example below we see: of subfield 200$a (main title) the first 20 characters of the word are taken. This data is normalised (everything to uppercase, punctuation except spaces stripped). Of subfield 700$b (author last name) the first 4 characters of the field are taken and also normalised. Then the first 5 characters of the publisher name are taken. Then we take the first 4 characters of the publication year, numeric only so the actual date remains (and additions like cop., ed. are ignored).
The system supports creation of the keys and saving them, so they can be used again and again, for instance when importing records, searching titles via Z39.50 etc.
But the process can also be done “on the fly” without permanently saving the keys.
A matching profile detects identical records and determines how the system must react to records that have identical keys. The rules saved in the profiles determine what must be done with the records. These rules can be something like this:
· “If multiple records with an identical key are found, merge them”.
· “If a record is imported and no identical keys are found, create a new record”.
To summarise: a rule determines what must be done in case there are zero, one or multiple matches with a key.
The profile screen looks like this:
These import profiles can be used elsewhere, for instance for importing titles.
: choose this option to create a new matching profile definition.
:select a profile and then this option to delete it.
:select a profile and then this option to modify the general properties.
:select a profile and then this option to modify the linked definition. In that case the following screen will be displayed:
The ISBN is the criterion for finding duplicate records in this example. Because there is an index on ISBN the key is not absolutely necessary. We choose the ISBN index, action 0 matches: new record, action 1 match: update record, action multiple matches: put records in a savelist.
The effect is that when no ISBN is found in the index, a new title record will be created. In case there is one match, the new title will be merged with the existing title. In case of multiple matches, the system cannot determine with which record the incoming record must be merged. It will therefor be put in a savelist. In case you choose ‘update record’ also for multiple matches, the system will merge the incoming record with the first matching record encountered in the ISBN index.
: choose this option to create a new definition.
: select a file and then this option to modify the general properties.
:select a file and then this option to delete it.
A matching file is a pseudo index with unique keys for the records. The overview screen shows all files created (via AFO 114, option 4). From here you can delete them. Viewing/modifying the contents is done in AFO 115.
The files are shown in a list:
Name: Name of the file
Comment: Explanatory description (what the file is for)
Key definition profile: Profile used for this file
Application: Bibliographic (titles) or authorities (e.g. authors)
Database: Valid database name
Savelist: Name of processed savelist (empty when the complete database has been processed)
Status: Ready or Processing
Keys: Number of keys (number/merged/deleted)
Records: Number of records (number/merged/deleted)
When the option “use for merge” has been selected when creating the matching file, the file can be used in AFO 115 to merge records. In that case only duplicate records will be saved in the file.
: choose this option to create a new file.
: select a line and then this option to modify the general properties.
:select a line and then this option to delete it..
Refresh: select this option to refresh the screen. Any new files added by another user in the mean time will be shown as well now.
A definition for a matching file is made in AFO 114 under option 2 en then built with option 4. After choosing this option an input form will be displayed:
After (re)creating you can find the file under “Matching files” in AFO 114.
· Document control - Change History
Version |
Date |
Change description |
Author |
1.0 |
unknown |
Creation |
|
2.0 |
May 2006 |
Various revisions Delivered as part of build 17 set |
|